rank | frequency | n-gram |
---|---|---|
1 | 195295 | -k |
2 | 167408 | -t |
3 | 132493 | -l |
4 | 97287 | -n |
5 | 94780 | -a |
rank | frequency | n-gram |
---|---|---|
1 | 49072 | -ek |
2 | 47763 | -ak |
3 | 33002 | -en |
4 | 32044 | -ól |
5 | 31353 | -an |
rank | frequency | n-gram |
---|---|---|
1 | 35614 | -nak |
2 | 26778 | -nek |
3 | 24404 | -ban |
4 | 17930 | -ben |
5 | 14573 | -kat |
rank | frequency | n-gram |
---|---|---|
1 | 9546 | -ként |
2 | 9259 | -ának |
3 | 6528 | -kkal |
4 | 6506 | -ával |
5 | 6357 | -ában |
rank | frequency | n-gram |
---|---|---|
1 | 2840 | -okkal |
2 | 2814 | -ással |
3 | 2639 | -ekkel |
4 | 2171 | -sának |
5 | 2025 | -jának |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings